Verb vocabularies are shaped by complex meanings from the onset of development

Verbs and nouns vary in many ways – including in how they are used in language and in the timing of their early learning. We compare the distribution of semantic features that comprise early-acquired verb and noun meanings. Given overall semantic and syntactic differences between nouns and verbs, we hypothesized that the preference for directly perceptible features observed for nouns would be attenuated for verbs. Building on prior work using semantic features and semantic networks in nouns, we find that compared to early-learned nouns (N = 359), early-learned verbs (N = 103) have meanings disproportionately built from complex information inaccessible to the senses. Further, children’s early verb vocabularies (N = 3,804) show semantic relationships strongly shaped by this complex information from the beginning of vocabulary development. Complexity is observed in early verb meanings and is reflected in the vocabularies of children even at the outset of verb learning.

Evidence suggests that simple and accessible information is preferred in early vocabularies.Early-learned nouns, for instance, tend to have high concreteness and imageability (e.g., Hansen, 2017).Early verbs also demonstrate similar patterns favoring easily accessible perceptual information.For example, perceptual similarity of participants across events helps toddlers extend novel verbs to dissimilar participants (Childers et al., 2016;Haryu, Imai, & Okada, 2011).
Nevertheless, verb meanings may comprise different kinds of information than noun meanings.Given that verbs are less imageable and concrete than nouns, verb meaning may reflect less directly accessible semantic information compared to nouns.As such, early verb learning requires learning about the wealth of verb-specific semantic detail involved in verbs' thematic roles.For example, adults know that the meaning of the verb "frighten" includes knowledge about its patient -not only basic facts that the patient is likely affected (Dowty, 1991) -but also specific information that things that are frightened tend to be "scared", "small", "weak", and "helpless" (McRae, Ferretti, & Amyote, 1997).While some parts of this information are directly perceptible (e.g., "small"), other parts implicate more advanced knowledge (e.g., "helpless").The extent to which such socially determined, complex information is prioritized in early verb meanings is unknown but such information is likely pervasive.
In this study, we ask two questions about verb learning in young children.First, are the meanings of early-learned verbs composed of different kinds of semantic information than nouns?And second, how do different kinds of semantic information contribute to the semantic relationships among early-learned nouns and verbs?Word meaning has been fruitfully decomposed into tractable subcomponents by developing and analyzing lists of semantic microfeatures that are produced by adults in response to concepts (McRae, Cree, Seidenberg, & McNorgan, 2005;Vinson & Vigliocco, 2008).For example, the semantic features of the word "tree" include features like <is tall> and <has leaves>.Semantic features account for patterns of word use and understanding in adults (e.g., McRae et al., 1997;Pexman, Hargreaves, Siakaluk, Bodner, & Pope, 2008) and children (e.g., Engelthaler & Hills, 2017;Hills, Maouene, Maouene, Sheya, & Smith, 2009;Peters & Borovsky, 2019;Stella, Beckage, Brede, & De Domenico, 2018).Semantic features can be categorized according to the type of information they encode (Cree & McRae, 2003).Perceptual features encode sensorimotor information about meaning (e.g., <is red>, <is greasy>).Functional features are associated with information about interactions with objects (e.g., <used for transportation>, <is eaten>).Taxonomic features refer to hierarchical relationships among concepts (e.g., <a vehicle>, <a food>).Finally, encyclopedic features are those that do not fit into the other categories (e.g., <is fun>, <is poisonous>) and are often associated with social factors, affect/emotion, and decontextualized knowledge (Cree & McRae, 2003).As such, encyclopedic features often describe complex meanings that cut across the other feature types.In fact, these features have occasionally been excluded from semantic-feature-based analyses of early vocabulary development because they have been assumed to be inaccessible and too complex (e.g., Hills et al., 2009), though we follow other work that includes them (Peters & Borovsky, 2019).Among these feature types, perceptual and taxonomic features have been repeatedly implicated as shaping children's early noun vocabularies (Engelthaler & Hills, 2017;Peters & Borovsky, 2019).For example, nouns with more perceptual features tend to have earlier ages of acquisition (AOAs), even after controlling for word frequency and the number of other feature types (Peters & Borovsky, 2019).Perceptual features may be particularly important to early vocabulary development because high concreteness (e.g., Gilhooly & Logie, 1980), imageability (e.g., McDonough et al., 2011), perceptual accessibility (Della Rosa, Catricalà, Vigliocco, & Cappa, 2010), and perceptual salience (Pruden, Hirsh-Pasek, Golinkoff, & Hennon, 2006) facilitate word learning and perceptual features tend to encode such information.
Perceptual features may also play a similar role for verbs.In general, verbs are better learned and recognized by young children when they have participants that are more familiar (Kersten & Smith, 2002) and more physically similar to those encountered in prior events (Childers et al., 2016;Haryu et al., 2011).These findings suggest that verb semantics in early vocabulary acquisition may be driven by perceptual information.It may only be later in vocabulary acquisition that non-perceptual information may support verb semantics.In general, the perceptual feature advantage in noun learning has been argued to reflect a perceptual-to-conceptual shift in language processing (Quinn & Eimas, 1997, 2000).This idea suggests that early noun learning depends on perceptually accessible features that are only later supplemented by more complex features, a transition that may also occur with verbs.
Alternatively, the four feature types may differently impact noun and verb semantics.This difference may be driven by differences in the concepts and actions that nouns and verbs refer to in events.Verbs often serve a coordinating role in sentences describing events, describing how one participant affects or interacts with another.Such interactions among participants may be particularly salient for younger children, who tend to prioritize associative or functional links among objects (e.g., <is eaten>) in some semantic processing tasks (e.g., Smiley & Brown, 1979).
The potential importance of functional features for verbs is consistent with theories suggesting that children undergo a thematic-to-taxonomic shift in vocabulary development (Inhelder & Piaget, 1964;Smiley & Brown, 1979).Under this theory, early verb semantics may expose event associations between verbs and nouns through functional features.For example, an early-learned verb like "eat" has clear functional relationships with a noun like "cookie" because both words tend to occur in similar event contexts like mealtimes.Such features could highlight likely relationships between objects and actions and therefore serve as important aspects of verb meaning.This idea is consistent with evidence that children's early complex play behaviors are often associated with the typical functional uses of objects (e.g., Zelazo & Kearsley, 1980), suggesting that such functional information is important to children's processing of events -and ultimately, their learning of verbs.
Encyclopedic features might also play a larger role in verb compared to noun learning.This idea is supported by findings that children need to learn the properties of verb meanings and associated event participants on a verb-by-verb basis (Alishahi & Stevenson, 2010;Meints, Plunkett, & Harris, 2008;Yuan, Fisher, Kandhadai, & Fernald, 2011).Such verb-specific features likely detail knowledge about common attributes of event participants like social roles, behaviors, or relationships, or emotional state and affect, that are not captured by other feature types.Work on school-age children suggests that the emotional valence of words influences their processing and learning, suggesting that such social and emotional information may also influence early verb learning (Ponari, Norbury, & Vigliocco, 2018, 2020).
Before we examine how feature type influences children's early vocabularies, we first describe the overall vocabulary challenge faced by children by examining the relative feature composition of nouns and verbs in a large sample of early-learned words.We predicted that functional and encyclopedic features would make up relatively more of the feature composition of verbs compared to nouns and that perceptual and taxonomic features would be more abundant for nouns compared to verbs.

Experiment 1 Method
Semantic features for all of the nouns (N = 359) and verbs (N = 103) on the American English-language version of the MacArthur-Bates Communicative Development Inventory: Words and Sentences (MBCDI) were used (Borovsky, Peters, Cox, & McRae, Under review;Kueser, Horvath, & Borovsky, In prep.;McRae et al., 2005).The MBCDI is a checklist of early vocabulary items completed by caregivers for children aged 16 to 30 months of age.The semantic feature data sets were collected by asking adult participants to describe features of the words; the raw participant responses were standardized into semantic features shared across nouns and verbs.The resultant features were then categorized by type (Cree & McRae, 2003).
We counted the number of encyclopedic, functional, perceptual, and taxonomic features for each word.There were similar total raw numbers of features for nouns, M = 13.15,SD = 3.42, and verbs, M = 12.41, SD = 3.97, t(148.24)= 1.72, p = .088.To avoid any small differences in raw number biasing the results, we divided each count by the total number of features associated with each word, resulting in a set of proportions of the relative amount of encyclopedic, functional, perceptual, and taxonomic information for each noun and verb.
The relative proportion of each feature type across nouns and verbs was analyzed using ANOVA.The dependent variable was proportion and the independent variables were feature type and part of speech.Models and post-hoc tests were conducted using R version 4.1.1(R Development Core Team, 2008) and the emmeans version 1.8.1-1 (Lenth, 2019)

Results
Figure 1 illustrates the relative proportion of features of each feature type across the MBCDI nouns and verbs.Several patterns emerged in the statistical analyses of these proportions, the results of which are presented in Table 1.There was a significant main effect of feature type F(3) = 261.5,p < .001.Across nouns and verbs, encyclopedic and perceptual features were most abundant.The average proportion of encyclopedic features relative to all features was 0.41, 95% CI: [0.39, 0.42].The average proportion of perceptual features was 0.34, 95% CI: [0.33, 0.36].The average proportion of taxonomic features was 0.13, 95% CI: [0.12, 0.15].Last, the average proportion of functional features was 0.12, 95% CI: [0.10, 0.13].

Experiment 2
Experiment 1 demonstrated that early-acquired nouns and verbs differ in their semantic feature composition.While both nouns and verbs tended to have proportionally more encyclopedic and perceptual features than functional and taxonomic features, verbs prioritized encyclopedic features and nouns prioritized perceptual and functional features.Overall, these patterns indicated that early-learned verbs' meanings have more complex and less perceptually accessible information than early-learned nouns.
How might children use these noun-verb differences in semantic feature information during learning?On one hand, children may ignore complex semantic information like encyclopedic features and focus on more accessible perceptual information.Alternatively, given the preponderance of encyclopedic features in early-acquired verbs' meanings, children might develop strategies to understand this more conceptually complex information.In order to expand their verb vocabularies, children may need to learn to consider aspects of meaning beyond directly perceptible features.
Semantic network modelling can help to distinguish among these options.Semantic networks treat words as nodes in a network and connect words through shared semantic features.(Figure 2 shows an example semantic network.)Semantic network structure among words can be examined in this network by measuring the strength of semantic connections across the words in the network.For example, in Figure 2, "kitty" exhibits semantic structure characterized by strong semantic connections to its neighbors whereas "playground" has semantic structure characterized by having no semantic connections to any other word.Patterns of semantic structure in vocabulary networks predict words' age of acquisition (Beckage & Colunga, 2019;Borovsky, Ellis, Evans, & Elman, 2016;Engelthaler & Hills, 2017;Fourtassi, Bian, & Frank, 2020;Hills, Maouene, Riordan, & Smith, 2010;Hills et al., 2009;Peters & Borovsky, 2019;Sailor, 2013;Stella et al., 2018;Steyvers & Tenenbaum, 2005) and other aspects of word processing (Borovsky, 2020(Borovsky, , 2022;;Peters, Kueser, & Borovsky, 2021).Examining patterns of semantic structure between nouns and verbs may help to identify which kinds of semantic features serve to connect these words to neighbors and which are prioritized in early learning.Some evidence suggests that perceptual feature content may be a robust driver of semantic structure in early noun learning (Peters & Borovsky, 2019).In this study, the authors created semantic networks with the nouns on the MBCDI and their semantic relationships as defined by shared semantic features.Nouns that were directly connected to many other nouns through shared perceptual features tended to have earlier AOAs; the effect of other semantic feature types was less pronounced.Such a pattern may also occur for verbs.
However, given that Experiment 1 suggested that verb meaning more heavily relies on encyclopedic features, verb semantic structure may also reflect this tendency.Such a pattern might be seen in verbs' semantic connections to other words being primarily composed of encyclopedic features compared to other features.Only one study has examined verbspecific network semantic structure (Kueser, Horvath, & Borovsky, Under review).In this study, verb and noun semantic structure demonstrated systematic differences.For example, while early-learned nouns tended to demonstrate strong semantic connections with other nouns throughout vocabulary development, early-learned verbs had relatively weaker direct semantic connections to other verbs.However, this study did not consider how different semantic features may contribute to semantic structure.
In Experiment 2, we expand on the prior work showing that the semantic network structure of children's early vocabularies demonstrates differences across feature types.Here, we measure how nouns' and verbs' semantic structure in children's early vocabularies differ as a function of feature type.Importantly, we account for the differences in semantic feature composition between nouns and verbs observed in Experiment 1 and for differences in quantities of nouns and verbs in vocabularies by normalizing the semantic network measures with respect to random networks composed of each feature type with controlled numbers of nouns and verbs.In this way, we ask whether nouns and verbs differ in semantic connections to other words compared to what would be expected given the baseline feature and vocabulary composition of nouns and verbs.

Method
MBCDI Vocabulary Data-Data for children's early vocabularies came from administrations of the MBCDI vocabulary checklist (Fenson et al., 2007) stored in the WordBank MBCDI database (Frank, Braginsky, Yurovsky, & Marchman, 2017).As we were focused on understanding patterns of typical development, children were included in the data set if they had productive vocabulary size percentiles greater than the 20 th percentile, a commonly used cutoff separating children with typical development from those who are late talkers (e.g., Beckage et al., 2011;Feldman et al., 2005).The final sample size was 3,804 children (1437 female, 1422 male, 945 unknown).
Semantic Network Creation and Measurement-Semantic networks were created for each child by adding the words produced by the child as nodes.Edges between words were established if words shared semantic features.All features were included.Edges were undirected and weighted to represent the number of shared semantic features between words.For each child, separate semantic networks were created using perceptual, functional, taxonomic and encyclopedic features only.We used graph-tool version 2.44 (Peixoto, 2014) running on Python version 3.8.12.
Weighted degree was measured for each word in each network for each child (see Figure 3 for an example).Weighted degree measures the sum of the weights of the edges between a node and the nodes to which it is connected.Weighted degree is high when words are strongly connected to their neighbors and low when words are unconnected or only weakly connected.Last, we calculated the average weighted degree for the nouns and for the verbs within each network type for each child.
Network normalization procedure-Given the result in Experiment 1 that nouns and verbs differed in the relative proportion of perceptual, functional, taxonomic, and encyclopedic features, nouns and verbs would be expected to demonstrate differences in feature-based networks.However, these differences would be relatively uninformative with respect to answering the question about whether children's early vocabularies preferentially consist of nouns or verbs demonstrating stronger perceptual connections relative to other feature types.To address this potential limitation, we normalized the raw weighted degrees in each network type, separately for nouns and for verbs.
Another reason for normalizing the raw network measure is to account for the fact that there are different numbers of nouns and verbs on the MBCDI (or in children's vocabularies in general, given nouns' earlier age of acquisition).Our procedure accounts for that fact by comparing the observed network values to randomly generated networks with the same numerical composition of nouns and verbs.Specifically, we generated 868,000 random networks across different noun-verb vocabulary sizes for each feature type.The random networks consisted of random selections of nouns and verbs from the MBCDI.Within each randomly generated network, we measured each word's weighted degree and separately calculated the average of weighted degree for nouns and verbs across each random network.Using the distribution of weighted degree in the random networks, we calculated percentile ranks for the raw values in the children's networks.This was done using the random network of the same noun-verb vocabulary size as the child's network.For example, for a child with 10 nouns and three verbs, we first calculated the child's raw weighted degree for nouns and for verbs and then compared those raw degrees to the distribution of degree for nouns and for verbs across the random networks with the same noun-verb vocabulary size.
Instead of directly sampling all random networks across the noun-verb vocabulary space (which had 3,279 distinct combinations across our sample), we strategically sampled from this space.First, we randomly sampled 15% of the noun-verb vocabulary combinations that lay within two standard deviations of the center of the space as identified with principal components analysis.Second, we sampled all noun-verb vocabulary size combinations that lay outside of the two-standard-deviations area.Third, we sampled along the edges of the noun-verb vocabulary size space.These steps resulted in a final sample of 868 points in the noun-verb vocabulary size space.
For each of the 868 noun-verb vocabulary size combinations, 1,000 randomly generated networks were created from random sets of nouns and verbs for each of the feature types.We calculated weighted degree for each noun and verb in each of these networks and then, on a network-by-network basis, calculated the average weighted degree for each part of speech.For each feature-type network, we then created a three-dimensional histogram for nouns and for verbs of the average weighted degrees across the noun-verb vocabulary space.Specifically, this was a three-dimensional array with dimensions corresponding to noun vocabulary size × verb vocabulary size × weighted degree histogram for nouns or for verbs.
The three-dimensional histogram was used to estimate unsampled points in the noun-verb vocabulary space.We linearly interpolated weighted degree across two-dimensional nounverb vocabulary size slices of the array using the LinearNDInterpolator function in the Python package scipy version 1.8.0 (Virtanen et al., 2020).Values within the weighted degree histogram columns were then normalized so their sum was one.Last, we used kernel density estimation from the Python package KDEpy version 1.1.0with bandwidth equal to two bins to slightly smooth the histograms to avoid artifacts from the binning and interpolation procedures (Odland, 2018).
The interpolated histogram was then used to normalize the raw weighted degree values from the children's vocabulary networks.We report percentile ranks that correspond to the proportion of random networks with values at or below a child's observed weighted degree.For example, if a child's average weighted degree for nouns in the perceptual network was 10, we would use the interpolated histogram at the same vocabulary size for the perceptual network and calculate the proportion of randomly generated networks that had a weighted degree for nouns at or below 10.This proportion was then multiplied by 100 and served as the percentile rank that we report in our results.We call this quantity "normalized degree".
Cluster analyses procedure-We used cluster-based permutation testing to compare normalized degree between nouns and verbs across vocabulary sizes in our sample (Maris & Oostenveld, 2007).This procedure identifies ranges of significantly different values in time-series-like data while controlling the family-wise error rate.Across children and for each network type, normalized degree for nouns and verbs was separately put into vocabulary size bins of 20 words.Within each network type, normalized degree for nouns and verbs was compared.Following the cluster-based permutation testing procedure, for each network-type, individual t-tests were conducted within each bin.Significantly different comparisons were identified using a t threshold of 3.29.Contiguous ranges of significantly different bins were identified and the t statistics of those comparisons were added together; this value is termed the cluster mass t statistic for that cluster.Separately, the data were randomly shuffled within children and the t-test procedure above was repeated 10,000 times.Last, we compared the cluster mass t statistic for each identified cluster in the children's actual data to the distribution of cluster mass t statistics in the randomly shuffled data.This resulted in an empirical p value for each cluster that reflected how likely it was that the observed cluster would be as large or larger by random chance alone.

Results
Table 3 reports the results of the cluster-based permutation testing procedure, comparing noun and verb normalized degree within each network type to identify ranges of significantly different values.Figure 4 shows the average normalized degree for nouns and verbs across children in networks created using encyclopedic, functional, perceptual, and taxonomic features.For encyclopedic features, children's vocabularies demonstrated significantly greater normalized degree for verbs than for nouns from 20 to 420 words, nearly the entire range of vocabulary development assessed on the MBCDI.For all other features, normalized degree for nouns tended to be larger than for verbs.Specifically, for functional features, two significant clusters were identified in which noun normalized degree was higher than verb normalized degree, one from 40 to 180 words and another from 360 to 380 words.For perceptual features, a significant cluster from 20 to 180 words was identified in which normalized degree was larger for nouns than for verbs.Last, for taxonomic features, there was a significant cluster from 20 to 380 words in which normalized degree was larger for nouns than for verbs.
Last, given the enhanced role that encyclopedic features played for verbs compared to nouns and evidence for the overall importance of perceptual features for nouns, we conducted a cluster-based permutation test comparing normalized degree for encyclopedic features to normalized degree for perceptual features.For nouns, there was a significant cluster from 20 to 440 words where normalized degree for encyclopedic features was below that for perceptual features, cluster-t = −503.8,p < .001.For verbs, normalized degree for encyclopedic features was also below that for perceptual features from 80 to 440 words, cluster-t = −124.2,p < .001.

Discussion
Verb learning is often considered to be more difficult than noun learning for young children because verbs' referents are often less imageable and concrete than nouns' referents (Ma et al., 2009;McDonough et al., 2011) and verbs may refer to complex transient events with many participants (Talmy, 1985).This study adds another potential reason for verbs' difficulty -verbs' meanings have substantial contributions from complex meanings that are not grounded in perception, object function, or taxonomic structure.Compared to nouns, verbs' meanings are more often built from encyclopedic features, subcomponents of meaning often related to social relationships, affect/emotion, and decontextualized knowledge (Cree & McRae, 2003).Moreover, despite the difficulty of verbs' meanings, children's early verb vocabularies demonstrated semantic relationships with other words through encyclopedic features in addition to perceptual features.Rather than avoiding the challenge of verbs' complex meanings, children systematically used these complex features to structure their vocabularies.
At the same time, perceptual information also supported structure of both nouns and verbs.This finding provides some support for theories that prioritize the impact of perceptual information in early lexical representation, which have focused largely on evidence in nouns.For example, Quinn and Eimas (1997) argued that meaning representations are initially grounded in perceptual information, which later form the basis for more advanced conceptual representations.The findings are also consistent with embodied cognition accounts, which posit that representations of word meaning are grounded in sensorimotor representations (Wellsby & Pexman, 2014).
While perceptual information supported structure of both verb and noun networks, the structure of verb semantic networks was additionally driven by complex encyclopedic information.This pattern suggests that conceptual development may proceed differently in response to learning nouns compared to verbs.Given that verbs refer to entities that are not pre-individuated in the world (Gentner, 1982), fundamental differences between nouns and verbs may produce different perceptual-conceptual demands.Alternatively, it is possible that the shift from perceptual to conceptual information had already occurred in our sample; by the time these children had learned their first verbs, they had already learned many nouns.
These findings offer a rich starting point for future work.For example, future studies could determine whether there are subtypes of encyclopedic or perceptual features that exert disproportionately strong effects on children's vocabulary structure.Additionally, future work should directly address how the semantic-feature differences seen here affect noun and verb learning by, for instance, examining how age of acquisition of nouns and verbs differs as a function of these words' investment in perceptual vs. encyclopedic information.
In sum, this study offers a first look at the contribution of semantic feature type to children's early noun and verb semantics.While noun meanings tended to prioritize directly perceptible information, verb meanings reflected more complex encyclopedic information even from the very beginnings of vocabulary development.Example of weighted degree.Edges used in the calculation for "puppy" are highlighted.If all edges have a weight of 2, then the weighted degree of "puppy" is 8.

Figure 1 :
Figure 1: Histograms of proportion of features of each feature type relative to the count of all features for each early-learned noun and verb on the MBCDI.Data are binned in intervals of 0.05.

Figure 2 :
Figure 2:Example semantic network with nouns in blue and verbs in red.Shared semantic features connect words.Edge weight is the number of shared features.

Figure 4 :
Figure 4: Normalized degree for nouns and verbs in semantic networks created using encyclopedic, functional, perceptual, and taxonomic features across 3,804 children's vocabularies.Horizontal dashed line shows expected degree in randomly generated networks of the same size.Values are binned in intervals of 20 words.POS = Part of speech.

Table 2 :
Mean proportions by feature type.